Create bar or line charts showing monthly total kBTUs of residential and commercial electricity and gas consumption for the Bay Area (meaning the sum of all ZIP codes in the 9 Bay Area counties) from 2017 to the latest available month (meaning a 42-column version of the plots from section 1.8). Look online for the correct conversion of kWhs to kBTUs and therms to kBTUs. Make sure that electricity and gas data are distinguishable but plotted in the same chart; feel free to separate your analyses for residential and commercial into two separate charts if you believe it improves legibility, or do one chart with 4 colors in the legend. Comment on any observable changes in energy consumption that may be attributable to the COVID-19 pandemic. Create at least one map that highlights which neighborhoods experienced the greatest change in electricity consumption before and after the pandemic began, and comment on your results. Explain any key assumptions you made in the analysis, or caveats about the data sources that you think the reader should be aware of. Publish all of this work in a web report.
The following libraries and packages are used in this analysis:
knitr::opts_chunk$set(warning = F, message = F)
library(tidyverse)
library(ggplot2)
library(leaflet)
library(tigris)
library(sf)
library(leaflet)
library(plotly)
library(zoo)
library(knitr)
The following code imports electricity usage datasets from Pacific Gas & Electric (PG&E) between the years 2017 and 2020, located at: https://pge-energydatarequest.com/public_datasets
years <- 2017:2020
quarters <- 1:4
types <- "Electric"
pge_elec <- NULL
for(quarter in quarters){
for(year in years){
for(type in types){
filename <- paste0("PGE_", year, "_Q", quarter, "_", type, "UsageByZip.csv")
if(year == 2020 && quarter %in% 3:4){
next
}
temp <- read_csv(filename)
}
pge_elec <- rbind(pge_elec, temp)
saveRDS(pge_elec, "pge_elec.rds")
}
}
#Conversion factors to convert KWH to kBTU
pge_elec$TOTALKBTU <- pge_elec$TOTALKWH*3.412
pge_elec$AVERAGEKBTU <- pge_elec$AVERAGEKWH*3.412
pge_elec$TOTALKWH <- NULL
pge_elec$AVERAGEKWH <- NULL
Now, I import gas usage data from Pacific Gas & Electric (PG&E) between the years 2017 and 2020, located at: https://pge-energydatarequest.com/public_datasets
years <- 2017:2020
quarters <- 1:4
types <- "Gas"
pge_gas <- NULL
for(quarter in quarters){
for(year in years){
for(type in types){
filename <- paste0("PGE_", year, "_Q", quarter, "_", type, "UsageByZip.csv")
if(year == 2020 && quarter %in% 3:4){
next
}
temp <- read_csv(filename)
}
pge_gas <- rbind(pge_gas, temp)
saveRDS(pge_gas, "pge_gas.rds")
}
}
pge_gas$TOTALKBTU <- pge_gas$TOTALTHM*100
pge_gas$AVERAGEKBTU <- pge_gas$AVERAGETHM*100
pge_gas$TOTALTHM <- NULL
pge_gas$AVERAGETHM <- NULL
Binding the two datasets together:
pge_elec_gas <- rbind(pge_gas, pge_elec)
pge_elec_gas
## # A tibble: 185,357 x 8
## ZIPCODE MONTH YEAR CUSTOMERCLASS COMBINED TOTALCUSTOMERS TOTALKBTU
## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 92304 1 2017 Gas- Commerc~ Y 0 0
## 2 92304 2 2017 Gas- Commerc~ Y 0 0
## 3 92304 3 2017 Gas- Commerc~ Y 0 0
## 4 92365 1 2017 Gas- Commerc~ Y 0 0
## 5 92365 2 2017 Gas- Commerc~ Y 0 0
## 6 92365 3 2017 Gas- Commerc~ Y 0 0
## 7 93203 1 2017 Gas- Commerc~ Y 0 0
## 8 93203 2 2017 Gas- Commerc~ Y 0 0
## 9 93203 3 2017 Gas- Commerc~ Y 0 0
## 10 93204 1 2017 Gas- Commerc~ Y 0 0
## # ... with 185,347 more rows, and 1 more variable: AVERAGEKBTU <dbl>
The following code sets up the Bay Area counties, cities, zip codes, and associated geometry:
bay_county_names <-
c(
"Alameda",
"Contra Costa",
"Marin",
"Napa",
"San Francisco",
"San Mateo",
"Santa Clara",
"Solano",
"Sonoma"
)
bay_counties <-
counties("CA", cb = T, progress_bar = F) %>%
filter(NAME %in% bay_county_names)
usa_zips <-
zctas(cb = T, progress_bar = F)
bay_zips <-
usa_zips %>%
st_centroid() %>%
.[bay_counties, ] %>%
st_set_geometry(NULL) %>%
left_join(usa_zips %>% select(GEOID10)) %>%
st_as_sf()
ca_cities <- places("CA", cb = T, progress_bar = FALSE)
bay_cities <- ca_cities[bay_counties, ]
bay_cities_within <-
ca_cities %>%
st_centroid() %>%
.[bay_counties, ] %>%
st_set_geometry(NULL) %>%
left_join(ca_cities %>% select(GEOID)) %>%
st_as_sf()
I now filter the PGE_Elec_Gas combined dataset only to Residential and Commercial gas and electricity usage corresponding to each zip code, month and year.
pge_final <-
pge_elec_gas %>%
mutate(
ZIPCODE = ZIPCODE %>% as.character()
)%>%
filter(
CUSTOMERCLASS %in%
c(
"Elec- Residential",
"Elec- Commercial",
"Gas- Residential",
"Gas- Commercial"
)
) %>%
select(
!c(COMBINED, AVERAGEKBTU)
) %>%
group_by(ZIPCODE, MONTH, YEAR, CUSTOMERCLASS) %>%
summarize(
TOTALKBTU =
sum(
TOTALKBTU,
na.rm = T
),
TOTALCUSTOMERS =
sum(
TOTALCUSTOMERS,
na.rm = T
)
) %>%
mutate(
AVERAGEKBTU =
TOTALKBTU/TOTALCUSTOMERS
) %>%
right_join(
bay_zips %>% select(GEOID10),
by = c("ZIPCODE" = "GEOID10")
) %>%
st_as_sf() %>%
st_transform(4326)
pge_final$DATE <- as.yearmon(paste(pge_final$YEAR, pge_final$MONTH), "%Y %m")
This following code reverses an apparent error with PG&E data that doubles customer and energy usage values for September 2017, and the period from June 2019 to December 2019:
pge_final_sep17 <- pge_final %>%
filter(DATE == "Sep 2017") %>%
mutate(TOTALKBTU = TOTALKBTU/2, TOTALCUSTOMERS = TOTALCUSTOMERS/2, AVERAGEKBTU = AVERAGEKBTU/2)
pge_final_wo_sep17 <- pge_final %>%
filter(DATE != "Sep 2017")
pge_final_2 <- rbind(pge_final_sep17, pge_final_wo_sep17)
pge_final_2half_19 <- pge_final_2 %>%
filter(MONTH %in% 7:12, YEAR == 2019) %>%
mutate(TOTALKBTU = TOTALKBTU/2, TOTALCUSTOMERS = TOTALCUSTOMERS/2, AVERAGEKBTU = AVERAGEKBTU/2)
pge_final_wo_2half_19 <- pge_final_2 %>%
filter(YEAR != 2019)
pge_final_1half_19 <- pge_final_2 %>%
filter(MONTH %in% 1:6, YEAR ==2019)
pge_final_3 <- rbind(pge_final_2half_19, pge_final_1half_19, pge_final_wo_2half_19)
pge_final_3
## Simple feature collection with 44662 features and 8 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -123.5335 ymin: 36.89303 xmax: -121.3083 ymax: 38.93713
## geographic CRS: WGS 84
## # A tibble: 44,662 x 9
## # Groups: ZIPCODE, MONTH, YEAR [12,183]
## ZIPCODE MONTH YEAR CUSTOMERCLASS TOTALKBTU TOTALCUSTOMERS AVERAGEKBTU
## * <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 94002 7 2019 Elec- Commer~ 14755515. 712 10362.
## 2 94002 7 2019 Elec- Reside~ 12483570. 10681 584.
## 3 94002 7 2019 Gas- Commerc~ 0 0 NaN
## 4 94002 7 2019 Gas- Residen~ 18881800 9463 998.
## 5 94002 8 2019 Elec- Commer~ 15809645. 710 11134.
## 6 94002 8 2019 Elec- Reside~ 13606592. 10697 636.
## 7 94002 8 2019 Gas- Commerc~ 0 0 NaN
## 8 94002 8 2019 Gas- Residen~ 17303900 9478 913.
## 9 94002 9 2019 Elec- Commer~ 15255124. 712 10713.
## 10 94002 9 2019 Elec- Reside~ 13553744. 10707 633.
## # ... with 44,652 more rows, and 2 more variables: geometry <MULTIPOLYGON [°]>,
## # DATE <yearmon>
The total energy consumption in kBTU is ordered by month and categorized by energy type and location in the plot below:
pge_chart <-
pge_final_3 %>%
ggplot() +
geom_bar(
aes(
x = DATE %>% as.Date(),
y = TOTALKBTU,
fill = CUSTOMERCLASS
),
stat = "identity",
position = "stack"
) +
labs(
x = "Month",
y = "kBTU",
title = "Bay Area Monthly Energy Usage, PG&E 2017-2020",
fill = "Energy Type"
)
pge_final_chart <- pge_chart %>%
ggplotly() %>%
layout(
xaxis = list(fixedrange = T, tickangle = -90),
yaxis = list(fixedrange = T)
) %>%
config(displayModeBar = F)
pge_final_chart
Next, I take the PGE_Final dataset from above. To determine the effect of COVID-19 on residential energy consumption, I assume energy consumption from March to June 2019 as the baseline. The following process displays the baseline (2019) consumption and the adjusted (2020) consumption and computes the difference and percent change.
pge_bay_change <-
pge_final_3 %>%
filter(YEAR %in% 2019:2020, MONTH %in% (3:6), CUSTOMERCLASS == "Elec- Residential") %>%
st_set_geometry(NULL) %>%
select(!c(MONTH, TOTALCUSTOMERS, DATE, AVERAGEKBTU)) %>%
pivot_wider(
names_from = YEAR,
values_from = TOTALKBTU
) %>%
rename(
kBTU_2019 = "2019",
kBTU_2020 = "2020"
) %>%
right_join(
bay_zips %>% select(GEOID10),
by = c("ZIPCODE" = "GEOID10")
) %>%
st_as_sf() %>%
st_transform(4326)
pge_bay_change$kBTU_Change <- pge_bay_change$kBTU_2020 - pge_bay_change$kBTU_2019
pge_bay_change$Percent_Change <- pge_bay_change$kBTU_Change/pge_bay_change$kBTU_2019 * 100
Finally, the percent differences are visually represented through the following mapping analysis:
res_pal <- colorNumeric(
palette = "plasma",
domain =
pge_bay_change$Percent_Change
)
leaflet() %>%
addTiles() %>%
addPolygons(
data = pge_bay_change %>% na.omit(),
fillColor = ~res_pal(Percent_Change),
color = "black",
opacity = 0.5,
fillOpacity = 0.5,
weight = 1,
label = ~paste0(
round(Percent_Change),
"% Electricity Usage Change in ",
ZIPCODE
),
highlightOptions = highlightOptions(
weight = 2,
opacity = 1
)
) %>%
addLegend(
data = pge_bay_change,
pal = res_pal,
values = ~Percent_Change,
title = "Percent Change in Electricity Usage, 2019 vs. 2020"
)
The first part of this assignment determines the annual cycle of Pacific Gas and Electric (PG&E) gas and electricity usage between January 2017 and June 2020 for the 9-county San Francisco Bay Area. A visual representation of the data can be found in the resulting plot above. In summary, annual energy usage peaks in the winter due to the need to heat homes and power holiday decorations. Energy usage dips in the summer despite the warmth of the season, likely because of the relatively cooler summers experienced in the Bay Area. By observation, it appears that the COVID-19 pandemic resulted in greater energy consumption than usual.
This point is numericalized in the second part of the assignment, where I compare electricity usage in the Bay Area between 2019 and 2020, specifically in the months of March and June of both years. The goal is to determine the magnitude of the increase in electricity use as a result of the shelter-in-place order issued during the COVID-19 pandemic. Here, I assumed the four months of March 2019-June 2019 as the baseline, or the energy usage in a “normal” year. I also assumed that the number of PG&E customers remained constant (or negligibly increased/decreased) during the between 2019 and 2020. I contrast these amounts with those from March 2020-June 2020.
The analysis reveals a median increase in residential electricity use of 7.5% between 2019 and 2020 in the Bay Area. One point of interest is that two of the largest increases are located in Downtown Oakland (+32%) and Downtown San Jose (+20%). This could be explained by the proportionally higher amount of high-density/transit-oriented developments in these downtown areas. With more transit-dependent residents (seniors, for example) required to stay home during the pandemic, this likely caused the larger percentage increase in electricity consumption.